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Abstract — This paper introduces a novel technique for access 
by a cognitive Secondary User (SU) using best-effort transmission 
to a spectrum with an incumbent Primary User (PU), which 
uses Type-I Hybrid ARQ. The technique leverages the primary 
ARQ protocol to perform Interference Cancellation (IC) at 
the SU receiver (SUrx). Two IC mechanisms that work in 
concert are introduced: Forward IC, where SUrx, after decod- 
ing the PU message, cancels its interference in the (possible) 
following PU retransmissions of the same message, to improve 
the SU throughput; Backward IC, where SUrx performs IC 
on previous SU transmissions, whose decoding failed due to 
severe PU interference. Secondary access policies are designed 
that determine the secondary access probability in each state 
of the network so as to maximize the average long-term SU 
throughput by opportunistically leveraging IC, while causing 
bounded average long-term PU throughput degradation and SU 
power expenditure. It is proved that the optimal policy prescribes 
that the SU prioritizes its access in the states where SUrx knows 
the PU message, thus enabling IC. An algorithm is provided to 
optimally allocate additional secondary access opportunities in 
the states where the PU message is unknown. Numerical results 
are shown to assess the throughput gain provided by the proposed 
techniques. 

Index Terms — Cognitive radios, resource allocation, Markov 
decision processes, ARQ, interference cancellation 



I. Introduction 

Cognitive Radios (CRs) [3| offer a novel paradigm for im- 
proving the efficiency of spectrum usage in wireless networks. 
Smart users, referred to as Secondary Users (SUs), adapt their 
operation in order to opportunistically leverage the channel 
resource while generating bounded interference to the Primary 
Users (PUs) ||4l-(|6l. For a survey on cognitive radio, dynamic 
spectrum access and the related research challenges, we refer 
the interested reader to [6 |-[9]. 

In a standard model for cognitive radio, the PU is a legacy 
system oblivious to the presence of the SU, which needs to 
satisfy given constraints on the performance loss caused to 
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the PU (underlay cognitive radio paradigm fSl). Within this 
framework, we propose to exploit the intrinsic redundancy, in 
the form of copies of PU packets, introduced by the Type-I Hy- 
brid Automatic Retransmission reQuest (Type-I HARQ flO]) 
protocol implemented by the PU by enabling Interference 
Cancellation (IC) at the SU receiver (SUrx). We introduce 
two IC schemes that work in concert, both enabled by the 
underlying retransmission process of the PU. With Forward IC 
(FIC), SUrx, after decoding the PU message, performs IC in 
the next PU retransmission attempts, if these occur While FIC 
provides IC on SU transmissions performed in future time- 
slots. Backward IC (BIC) provides IC on SU transmissions 
performed in previous time-slots within the same primary 
ARQ retransmission window, whose decoding failed due to 
severe interference from the PU. BIC relies on buffering of the 
received signals. Based on these IC schemes, we model the 
state evolution of the PU-SU network as a Markov Decision 
Process ifTTl . ifTSl . induced by the specific access policy used 
by the SU, which determines its access probability in each 
state of the network. Following the approach put forth by |[T3l . 
we study the problem of designing optimal secondary access 
policies that maximize the average long-term SU throughput 
by opportunistically leveraging FIC and BIC, while causing 
a bounded average long-term throughput loss to the PU and 
a bounded average long-term SU power expenditure. We 
show that the optimal strategy dictates that the SU prioritizes 
its channel access in the states where SUrx knows the PU 
message, thus enabling IC; moreover, we provide an algorithm 
to optimally allocate additional secondary access opportunities 
in the states where the PU message is unknown. 

The idea of exploiting PU retransmissions to perform IC 
on future packets (similar to our FIC mechanism) was put 
forth by ||T4]| . which devises several cognitive radio protocols 
exploiting the hybrid ARQ retransmissions of the PU. Therein, 
the PU employs hybrid ARQ with incremental redundancy and 
the ARQ mechanism is limited to at most one retransmission. 
The SU receiver attempts to decode the PU message in the first 
time-slot. If successful, the SU transmitter sends its packet and 
the SU receiver decodes it by using IC on the received signal. 
In contrast, in this work, we address the more general case 
of an arbitrary number of primary ARQ retransmissions, and 
we allow a more general access pattern for the SU pair over 
the entire primary ARQ window. We also model the interplay 
between the primary ARQ protocol and the activity of the 
SU, by allowing for BIC. It should be noted that IC -related 
schemes are also used in other context, e.g., decoding for 
graphical codes |15| and multiple access protocols lfT6l . 

Other related works include ifTTI . which devises an oppor- 
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Fig. 1. System model 

tunistic sharing scheme with channel probing based on the 
ARQ feedback from the PU receiver. An information theoretic 
framework for cognitive radio is investigated in [18] , where 
the SU transmitter has non-casual knowledge of the PU's 
codeword. In |19|, the data transmitted by the PU is obtained 
causally at the SU receiver However, this model requires a 
joint design of the PU and SU signaling and channel state 
information at the transmitters. In contrast, in our work we 
explicitly model the dynamic acquisition of the PU message 
at the SU receiver, which enables IC. Moreover, the PU is 
oblivious to the presence of the SU. 

The paper is organized as follows. Sec. presents the 
system model. Sec. Hill introduces the secondary access policy, 
the performance metrics and the optimization problem, which 
is addressed in Sec. |IV] Sec. |VT] presents and discusses the 
numerical results. Finally, Sec. IVIII concludes the paper The 
proofs of the lemmas and theorems are provided in the 
appendix. 

II. System Model 

We consider a two-user interference network, as depicted 
in Fig. [T] where a primary transmitter and a secondary 
transmitter, denoted by PUtx and SUtx, respectively, transmit 
to their respective receivers, PUrx and SUrx, over the di- 
rect links PUtx— >PUrx and SUtx— s>SUrx. Their transmissions 
generate mutual interference over the links PUtx— >SUrx and 
SUtx^-PUi-x. 

Time is divided into time-slots of fixed duration. Each time- 
slot matches the length of the PU and SU packets, and the 
transmissions of the PU and SU are assumed to be perfectly 
synchronized. We adopt the block-fading channel model, i.e., 
the channel gains are constant within the time-slot duration, 
and change from time-slot to time-slot. Assuming that the 
SU and the PU transmit with constant power Pg and Pp, 
respectively, and that noise at the receivers is zero mean 
Gaussian with variance cr^, we define the instantaneous Signal 
to Noise Ratios (SNR) of the Hnks SUtx^SUrx, PUtx^PUrx, 
SUtx^^PUrx and PUtx^-SUrx, during the nth time-slot, as 
7s (n), 7p(n), 7sp(n) and Jps{n), respectively. We model the 
SNR process {7a; (n), n = 0, 1, . . . }, where x e {s,p, sp,ps}. 



as i.i.d. over time-slots and independent over the different 
links, and we denote the average SNR as •jx — IE[7a;]- 

We assume that no Channel State Information (CSI) is 
available at the transmitters, so that the latter cannot allocate 
their rate based on the instantaneous link quality, to ensure 
correct delivery of the packets to their respective receivers. 
Transmissions may thus undergo outage, when the selected 
rate is not supported by the current channel quality. 

In order to improve reliability, the PU employs Type-I 
HARQ 1 101 with deadline D > 1, i.e., at most D transmissions 
of the same PU message can be performed, after which the 
packet is discarded and a new transmission is performed (the 
PU is assumed to be backlogged). We define the primary ARQ 
state t £ N{l,Df\ as the number of ARQ transmission at- 
tempts already performed on the current PU message, plus the 
current one. Namely, i = 1 indicates a new PU transmission, 
and the counter t is increased at each ARQ retransmission, 
until the deadline D is reached. We assume that the ARQ 
feedback is received at the PU transmitter by the end of 
the time-slot, so that, if requested, a retransmission can be 
performed in the next time-slot. 

On the other hand, the SU, in each time-slot, either accesses 
the channel by transmitting its own message, or stays idle. This 
decision is based on the access policy /i, defined in Sec. |III1 
The activity of the SU, which is governed by affects the 
outage performance of the PU, by creating interference to the 
PU over the link SUtx^-PUrx. We denote the primary outage 
probability when the SU is idle and accesses the channel, 
respectively, a^ 

gm(i?p)^Pr(^i?p>C7(7p)), 

qW{Rp)^Pr(^Rp>c(^-^^^y (1) 

where Rp denotes the PU transmission rate, measured in 
bits/s/Hz, C{x) = log2(l + x) is the (normalized) capacity 
of the Gaussian channel with SNR x at the receiver |20|. 
This outage definition, as well as the ones introduced later on, 
assume the use of Gaussian signaling and capacity-achieving 
coding with sufficiently long codewords. However, our analy- 
sis can be extended to include practical codes by computing 
the outage probabilities for the specific code considered. In ([T]), 
it is assumed that SU transmissions are treated as background 
Gaussian noise by the PU. This is a reasonable assumption in 
CRs in which the PU is oblivious to the presence of SUs. In 
general, we have ' (Rp) > <Zpp (^p)' where equality holds if 
and only if jsp = deterministically. We denote the expected 
PU throughput accrued in each time-slot, when the SU is idle 
and accesses the channel, as Tp^\Rp) — Rp[l — qpp{Rp)] and 
tI,^\Rp) = Rp[l - qip\Rp)], respectively. 

'We define N(no,"i) = {t G N, no < t < ni} for no < ni S N 
^ Herein, we denote the outage probability as q^^^ , where x and y are the 
source and the recipient of the message, respectively (PU if x,y = p, SU 
if x,y = s), and Z e {A,I} denotes the action of the SU (A if the SU is 

active and it accesses the channel. I if the SU remains idle). For example, 

(A) 

Qps is the probability that the PU message is in outage at SUrx, when SUtx 
transmits. 
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A. Operation of the SU 

Unlike the PU that uses a simple Type-I Hybrid ARQ 
mechanism, it is assumed that the SU uses "best effort" trans- 
mission. Moreover, the SU is provided with side-information 
about the PU, e.g., ARQ deadline D, PU codebook and 
feedback information from PUrx (ACK/NACK messages). 
This is consistent with the common characterization of the 
PU as a legacy system, and of the SU as an opportunistic and 
cognitive system, which exploits the primary ARQ feedback 
to create a best-effort link with maximized throughput, while 
the flow control mechanisms are left to the upper layers. By 
overhearing the feedback information from PUrx, the SU can 
thus track the primary ARQ state t. Moreover, by leveraging 
the PU codebook, SUrx attempts, in any time-slot, to decode 
the PU message, which enables the following IC techniques 
at SUrx: 

• Forward IC (FIC): by decoding the PU message, SUrx 
can perform IC in the current as well as in the following 
ARQ retransmissions, if these occur, to achieve a larger 
SU throughput; 

• Backward IC (BIC): SUrx buffers the received signals 
corresponding to SU transmissions which undergo outage 
due to severe interference from the PU. These transmis- 
sions can later be recovered using IC on the buffered 
received signals, if the interfering PU message is suc- 
cessfully decoded by SUrx in a subsequent primary ARQ 
retransmission attempt. 

We define the SU buffer state b G N(0, B) as the number 
of received signals currently buffered at SUrx, where B G 
denotes the buffer size. Moreover, we define the 
PU message knowledge state $ G {K, U}, which denotes the 
knowledge at SUrx about the PU message currently handled by 
the PU. Namely, if $ = K, then SUrx knows the PU message, 
thus enabling FIC/BIC; conversely (<& = U), the PU message 
is unknown to SUrx. 

Remark 1 (Feedback Information). Note that PUrx needs 
to report one feedback bit to inform PUtx (and the SU, 
which overhears the feedback) on the transmission outcome 
(ACK/NACK). On the other hand, two feedback bits need to 
be reported by SUrx to SUtx: one bit to inform SUtx as to 
whether the PU message has been successfully decoded, so 
that SUtx can track the PU message knowledge state and 
one bit to inform SUtx as to whether the received signal has 
been buffered, so that SUtx can track the SU buffer state b. 
Herein, we assume ideal (error-free) feedback channels, so 
that the SU can track (i, b, $), and the PU can track the 
ARQ state t. However, optimization is possible with imperfect 
observations as well f2T\. □ 

We now further detail the operation of the SU for 
$ G {K,U}. 

1) PU message unknown to SUrx — V): When $ = U 
and the SU is idle, SUrx attempts to decode the PU message, 
so as to enable FIC/BIC. A decoding failure occurs if the rate 
of the PU message, Rp, exceeds the capacity of the channel 

^ Note that B < D — 1, since the same PU message is transmitted at 
most D times by PUtx. Once the ARQ deadline D is reached, a new PU 
transmission occurs, and the buffer is emptied. 



-PU message decoded, ..^ 
SU interference ^S- 
treated as noise ^ 

-SU message imdecoded || 


PU and SU messages 
midecoded: capacity of 
interference free channels 
exceeded 










"S,^^ PU and SU messages undecoded: rx 
X. signal is buffered for BIC recovery 






i?sU = C(7./(l+7p.)) 


PU and SU messages 
jointly decoded 


+ 


-bU message decoded. 
^ PU interference 
^ s treated as noise 






^ v-PU message undec. 




O 

II 

of" 



Rate, i?„ 



Fig. 2. Decodability regions for PU message (rate Rp) and SU message 
(rate iJ^u) at SUrx, for a fixed SNR pair (7s!7ps) 



PUtx— i-SUrx, with SNR jpg. We denote the corresponding 
outage probability as qp] (Rp) — Pr{Rp > C{jps)). 

If the SU accesses the channel, SU transmissions are per- 
formed with rate Rsu (bits/s/Hz) and are interfered by the PU. 
SUrx thus attempts to decode both the SU and PU messages; 
moreover, if the decoding of the SU message fails due to 
severe interference from the PU, the received signal is buffered 
for future BIC recovery. Using standard information-theoretic 
results 1201 . with the help of Fig. |2] we define the following 
SNR regions associated with the decodability of the SU and 
PU messages at SUrx, where A"^ denotes the complementary 
set of An 

TpiRsv, Rp) = { hs,lps) ■■ Rsv < C (7.) ,Rp<C (jps) , 
Rsu + Rp < C {j, + jps)} (2) 
\J^{ls,lps):RsV>C{js),Rp<C 



Ips 



(3) 



r3(i?.u,i?p) = { (7.,7p.) : ^.u <C{-is).Rp <C{-ip,) , 
RsV + Rp < C {js + Ips)} (4) 
U {(7.' ■■Rp>C i^ps) ,Rsv<C (y^) } 
but {Rs\j,Rp ) ^ {rp(i?,u, Rp) u rsiRsv,Rp)y (6) 

f][{^s,lps):RsV<C{^s)}. 

The SNR regions (|2|i and (|4|i guarantee that the two rates Rp 
and Rsu are within the multiple access channel region formed 
by the two transmitters (PUtx and SUtx) and SUrx ||20], so fliat 
both the SU and PU messages are correctly decoded via joint 
decoding techniques. On the other hand, in the SNR region (jS) 
(respectively, (|3]l), only the SU (PU) message is successfully 



'^Herein, we assume optimal joint decoding techniques of the SU and PU 
messages. Using other techniques, e.g., successive IC, the SNR regions may 
change accordingly, without providing any further insights in the following 
analysis. 
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decoded at SUrx by treating the interference from the PU (SU) 
as background noise. If the SNR pair falls outside the two 
regions (|4|i and (|5]l (respectively, dU and (|3)), then SUrx incurs 
a failure in decoding the SU (PU) message. Therefore, when 
{js,Jps) G ^s{Rs\j , Rp), SUrx successfully decodes the SU 
message. The corresponding expected SU throughput is thus 
given by 

TsviR.v,Rp) ^ RsvPriii^ips) er,iR,^,Rp)) . (7) 

Similarly, when (7s,7ps) G rp{Rsv,Rp), 

SUrx successfully decodes the PU message. We 
denote the corresponding outage probability as 

^ Pr((7„7p,) ^rp(i?,u,i?p)). Note 



Qp^^ (i?sU 7 Rp 



that qpf^Rsu.Rp) > qp]{Rp), since SU transmissions 
interfere with the decoding of the PU message. 

Finally, in (|6]l, the decoding of both the SU and PU 
messages fails, since the SNR pair (7s,7ps) falls outside both 
regions rp{Rs\j, Rp) and Ts{Rs\j, Rp)- However, the rate i?su 
is within the capacity region of the interference free channel 
(RsV < C (7s)), so that the SU message can be recovered via 
BIC, should the PU message become available in a future ARQ 
retransmission attempt. The received signal is thus buffered at 
SUrx. We denote the buffering probability as 

Ps.huiiRsV,Rp) = Pr((7s7 7ps) G Tbuf{RsV,Rp)) 

= Pr((7„7p,) Gr,(i?,u,0)) (8) 
-Pr((7„7p,) Gr,(i?,u,i?p)) >0, 

where the second equality follows from inspection of Fig. |2] 

2) PU message known to SUrx ($ = K).- When $ = K, 
SUrx performs FlC on the received signal, thus enabling 
interference free SU transmissions. The SU transmits with rate 
RsK, and the accrued throughput is given by Tsk{Rsk) ~ 
R,KPr{R,K < C(7s)). 

We now provide an example to illustrate the use of FIC/BIC 
at SUrx. 

Example 1. Consider a sequence of 3 primary retransmis- 
sion attempts in which the SU always accesses the channel. 
Initially, the PU message is unknown to SUrx, hence the PU 
message knowledge state is set to $ = U in the first time-slot, 
and the SU transmits with rate i?su- Assume that the SNR pair 
(7s(l),7ps(l)) falls in T^uiiRsV , Rp)- Then, neither the SU 
nor the PU messages are successfully decoded by SUrx, but the 
received signal is buffered for future BIC recovery. In the sec- 
ond time-slot, {j,{2),jp42)) G Ts{RsV, Rp) r]Tp{Rsij , Rp), 
hence both the SU and PU messages are correctly decoded 
by SUrx, and the PU message knowledge state switches to 
$ = K. At this point, SUrx performs BIC on the previously 
buffered received signal to recover the corresponding SU 
message. In the third time-slot, SUtx transmits with rate Rsk, 
and decoding at SUrx takes place after cancellation of the 
interference from the PU via FIC. □ 

We now briefly elaborate on the choice of the transmission 
rate Rsk- Since its value does not affect the outage perfor- 
mance at PUrx ([T]i and the evolution of the ARQ process, RgK 
is chosen so as to maximize Tsk{Rsk)- Therefore, from (|8) 



we obtain 

Tsk{Rsk) >TsK{Rsv)^Tsv{RsV,Rp) 

+ Ps,huf{RsU, Rp)RsU > Ts\j{RsV, Rp)- (9) 

Conversely, the choice of the rate Rsv is not as straightfor- 
ward, since its value reflects a trade-off between the potentially 
larger throughput accrued with a larger rate Rs\j and the 
corresponding diminished capabilities for IC caused by the 
more difficult decoding of the PU message by SUrx. 

In the following treatment, the rates RsK, RsTJ and Rp 
are assumed to be fixed parameters of the system, and they 
are not considered part of the optimization (see Sec. |VT] for 
further elaboration in this regard). For the sake of notational 
convenience, we omit the dependence of the quantities defined 
above on them. Moreover, for clarity, we consider the case 
B = D — 1 in which SUrx can buffer up to D — 1 received 
signals. However, the following analysis can be extended to a 
generic value of B. 

III. Policy Definition and Optimization Problem 

We model the evolution of the network as a Markov 
Decision Process ifTTl . lfT2l . Namely, we denote the state of 
the PU-SU system by the tuple (t, 6, $), where t G N(l,£>) 
is the primary ARQ state, b G N(0,J5) is flie SU buffer 
state and $ G {U, K} is the PU message knowledge state. 
{t, b, $) takes values in the state space S = S\j U Sk, where 
Sk = {(i,0,K) : t G N(2,D)} and Sjj = {(t,6,U) : t G 
N(l, £>), b G N(0, t - 1)} are the sets of states where the PU 
message is known and unknown to SUrx, respectively. 

The SU follows a stationary randomized access policy fi G 
U = {n : S [0, 1]}, which determines the secondary access 
probability for each state s E S. Note that, from [221, this 
choice is without loss of optimality for the specific problem 
at hand. Namely, in state (t, b, $) G S, the SU is "active", i.e., 
it accesses the channel, with probability ij,{t, b, $) and stays 
"idle" with probabiHty 1 - ^i{t, b, $). We denote the "active" 
and "idle" actions as A and I, respectively. 

With these definitions at hand, we define the following 
average long-term metrics under ji: the SU throughput Ts{^), 
the SU power expenditure PsifJ-) and the PU throughput 
Tpin), given by 



N-l 



f,{fi)= lim -E 



+ lim —I 



PM^Ps lim -E 



So 



Tpifi) = lim ^E 

^^^^ N^+oo N 



5^i?.*„i({Qn = A}no:^„) 

ji=0 

RsvBnl{Op^ ,^) 

n=Q 

N-l 

J2 1 {{Qn = A}) 
ji=0 
1 

E (Op, 



ji=0 
N-l 



So 



So 



So 

(10) 

(11) 

(12) 



where n is the time-slot index, Sq G 5 is the initial state in 
time-slot 0; $„ G {K, U} is the PU message knowledge state 
and Bn is the SU buffer state in time-slot n; Q„ G {A, 1} 
is the action of the SU, drawn according to the access policy 
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Os.n and Ops,n denote the outage events at SUrx for the 
decoding of the SU and PU messages, so that „ and O^^ „ 
denote successful decoding of the SU and PU messages by 
SUrx, respectively; Op.„ denotes the outage event at PUrx, so 
that Op „ denotes successful decoding of the PU message by 
PUrx; and 1{E) is the indicator function of the event E. Note 
that all the quantities defined above are independent of the 
initial state Sq. In fact, starting from any Sq G S, the system 
reaches with probability 1 the positive recurrent state (1, 0, U) 
(new PU transmission) within a finite number of time-slots, 
due to the ARQ deadline. Due to the Markov property, from 
this state on, the evolution of the process is independent of 
the initial transient behavior, which has no effect on the time 
averages defined in (HB and (fT2T l. 

In this work, we study the problem of maximizing the 
average long-term SU throughput subject to constraints on 
the average long-term PU throughput loss and SU power 
Specifically, 

fi* = argmaxr,(/^) s.t. fp{fi) > rW(l - epu), 

Ps{^^)<vi"^\ (13) 

where epu G [0, 1] and pj*''-* g [0, Pg] represent the (nor- 
malized) maximum tolerated PU throughput loss with respect 
to the case in which the SU is idle and the SU power 
constraint, respectively. This problem entails a trade-off in the 
operation of the SU. On the one hand, the SU is incentivized 
to transmit in order to increase its throughput and to optimize 
the buffer occupancy at SUrx {i.e., failed SU transmissions 
which are potentially recovered via BIC). On the other hand, 
SU transmissions might jeopardize the correct decoding of the 
PU message at SUrx, thus impairing the use of FIC/BIC, and 
might violate the constraints in ( fTSl l. 

Under e W, the state process is a stationary Markov chain, 
with steady state distribution tt^ lfT2l . ||231 . 7r^i(s),s £ S, is 
the long-term fraction of the time-slots spent in state s, i.e., 
TT.is) = lim iEn=oP^^'^ (sNo)' Where Pr'") (s|so) is 

the n-step transition probability of the chain from state sqO 
In state {t, b, U), the SU accesses the channel with proba- 
bility /i {t, b, U), thus accruing the throughput /i {t, b, U) TgTj- 
Moreover, if SUrx successfully decodes the PU message (with 
probability 1 - qp] - fi{t,b,\J){qpj'^ - Qp])), &i?sU bits are 
recovered by performing BIC on the buffered received signals, 
yielding an additional BIC throughput. Similarly, in state 
{t, 0, K), the SU accrues the throughput /i {t, 0, K) T^k- Then, 
we can rewrite ( fTOt and (fTTT i in terms of the steady state 
distribution and of the cost/reward in each state as 

Z{ti)^TM{p) + Fsitx) + B,{ii), P,ifx)^PsWs{fi), (14) 

where the SU access rate W^s(^), i.e., the average long-term 
number of secondary channel accesses per time-slot, the FIC 
throughput Fs (/i) and the BIC throughput Bg (fJ.) are defined 



as 

= Ese5 (s) A* (s) , 
Fsif^)^ Ef=2'^M(i.0,K)/z(i,0,K)(T,K-T,u), 

" Ef=i E*.;o it, b, U) bRsv ^^^^ 
X 1 - q^^] - fi (t, b, U) (q^f^ - q^p] 

In ( fT4l ). TsTjWsifJ.) is the SU throughput attained without 
FIC/BIC, while the terms -F's(m) and Bs{^) account for the 
throughput gains of FIC and BIC, respectively. Conversely, the 
PU accrues the throughput T^^^ if the SU is idle and t}^^ if 
the SU accesses the channel, so that ( fT2] i is given by 

r,(M) = rw - (tW - T^^^W^iii). (i6) 

The quantity {T^^^ - T^^'')Ws{fJ.) is referred to as the PU 
throughput loss induced by the secondary access policy /i 1 13 1. 
The following result follows directly from (ITjt . (Il4l and (fTSI l. 



Lemma 1. The problem (T3[ is equivalent to 



(17) 



s.t. iVsifi) < 



(i-4p)epu n 



(th) 



(A) _ (I) ' p 



□ 



In the next section, we characterize the solution of (fTTt . We 
will need the following definition. 

Definition 1. Let /i be the policy such that secondary access 
takes place if and only if the PU message is known to SUrx, 
i.e., fi{s) = 1, Vs e 5k, ^j.{s) — 0, Vs e Sij. We denote 
the SU access rate achieved by such policy as eth — 
The system is in the low SU access rate regime if ew < eth 
in ( [TtI i. Otherwise, the system is in the high SU access rate 
regime. □ 

IV. Optimal Policy 

In this section, we characterize in closed form the optimal 
policy in the low SU access rate regime, and we present an 
algorithm to derive the optimal policy in the high SU access 
rate regime. 

A. Low SU Access Rate Regime 

The next lemma shows that, in the low SU access rate 
regime, an optimal policy prescribes that secondary access 
only takes place in the states where the PU message is known 
to SUrx, with an equal probability in all such states. It follows 
that only FIC, and not BIC, is needed in this regime to attain 
optimal performance. 

Lemma 2. In the low SU access rate regime ew < eth, an 
optimal policy is given b}@ 

^*(s) = Vs G 5k, h*{s) =0, Vs e Sv (18) 
eth 



^Similarly to l llOt . lilt and II2t . 7r^(s) is independent of the initial state 
So, due to the recurrence of state (1, 0, U). 



*The optimal policy in the low SU access rate is not unique. In fact, any 
policy fi such that /j(s) = 0, Vs g iSy and Ws{lJ-) = eth is optimal, 
attaining the same thi'oughput fa(/i) = TsK^th 2S (18). 
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Moreover, Ts(/x*) = T^k^w, Psip*) = Ps^w, and 



(I) 



in 



(I) 



□ 



Proof: For any policy fi ^ U obeying the SU access 
rate constraint Ws{fi) < ew, we have T's(m) < Ws{fi)TsK < 
ew^sK- The first inequahty holds since Ws{ij,)Tsk is the long- 
term throughput achievable when the PU message is known a 
priori at SUrx, which is an upper bound to the performance; 
the second from the SU access rate constraint. The upper 
bound ewTsK is achieved by policy ( fTSl l. as can be directly 
seen by substituting ( fTSl) in (fT4l i. (fTSl l. ■ 

Remark 2. Note that secondary accesses in states Su, where 
the PU message is unknown to SUrx, would obtain a smaller 
throughput, namely at most Tsu +Ps,buf-RsU < Tsk, where 
Tsu is the "instantaneous" throughput and Ps,buf^sU is the 
BIC throughput, possibly recovered via BIC in a future ARQ 
retransmission. Therefore, SU accesses in states 5k are more 
"cost effective". □ 

B. High SU Access Rate Regime 

In this section, we study the high SU access rate regime 
in which ew > eth, thus complementing the analysis above 
for the regime where ew < cth- It will be seen that, if ew > 
eth, unlike in the low SU access rate regime, the SU should 
generally access the channel also in states Su where the PU 
message is unknown to SUrx in order to achieve the optimal 
performance. Therefore, both BIC and FIC are necessary to 
attain optimality. In this section, we derive the optimal policy. 
We first introduce some necessary definitions and notations. 

Definition 2 (Secondary access efficiency). We define the 

secondary access efficiency under policy p gU in state s G S 
as 



dWs (m) 
d/j(s) 



(19) 



□ 



The secondary access efficiency can be interpreted as fol- 
lows. If the secondary access probability is increased in state 
s G 5 by a small amount S, then the PU throughput loss 
is increased by an amount equal to d{Tp^'' — Tp^'')'^^jf^Y' 
(from (fTSll), the SU power is increased by an amount equal 



to dP. 



d/j(s) 



(from (fT4li). and the SU throughput augments 

or diminishes by an amount equal to S^^^^ (depending 
on the sign of the derivative). Therefore, 77^ (s) yields the 
rate of increase (or decrease if 77^ (s) < 0) of the SU 
throughput per unit increase of the SU access rate, as induced 
by augmenting the secondary channel access probability in 
state s. Equivalently, it measures how efficiently the SU can 
access the channel in state s, in terms of maximizing the SU 
throughput gain while minimizing its negative impact on the 
PU throughput and on the SU power expenditure. 

Remark 3. It is worth noting that the definition of 77^ (s) given 
in Def. |2]is not completely rigorous. In fact, under a generic 
policy /i, the Markov chain of the PU-SU system may not be 
irreducible ll23l . so that state s may not be accessible, hence 



7r^(s) = and = = 0. One example is the idle 

policy /i(s) ~ 0, Vs: since the SU is always idle, the buffer 
at SUrx is always empty, hence states {t, b, U) with > are 
never accessed. To overcome this problem, a formal definition 
is given in App. |B] by treating the Markov chain of the PU-SU 
system as the limit of an irreducible Markov chain. 77^ (s) is 
explicitly derived in Lemma |6] in App. iBl □ 

We denote the indicator function of state s as 6s '■ S ^ 
{0,1}, with Ss{s) — 1, Ss{<j) — 0, \fcr ^ s. Moreover, we 
denote the policy at the ith iteration of the algorithm as /i'*^. 
We are now ready to describe the algorithm that obtains an 
optimal policy in the high SU access rate regime. An intuitive 
explanation of the algorithm can be found below. 

Algoritlim 1 (Derivation of the optimal policy). 

1) Initialization: 

. Let be the policy ii'-°\s) =0, V s e 5u, 

p^°\s) = 1, V s e 5k, and i = 0. 
. Let Sl°l = {seS : p,'^^'>{s) = 0} = 5u be the set 

of states where the SU is idle. 

2) Stage i: 

a) Compute rj^a) (s), V s G S^^^^ and let s^*) = 
argmax^g^w^ ?7^(.)(s). 

b) If 77 (i)(s(*)) < 0, go to step 3). Otherwise, let 



c) Seti :=^+LIf5i'^j^ = 



(2+1) _ 

die 



, go to step 3). Otherwise, 
repeat from step 2). 
3) Let N = i, the sequence of states {s^^\ 



, s 



■JV- 



-1)) and 



of policies (/x'"-' , . . . , /x^ 
4) Optimal policy: given ew, 

a) If W^,(^(^-i)) < ew, then p.* = 7^(^-1). 

b) Otherwise, fi* = A^^^) + (1 - X)ti^^+^K where j = 
max{i:W^s (a^*^*-*) < ew} and A G (0,1] uniquely 
solves WsiX^^^^ + (1 - A)7x(j'+i)) = ew- □ 

The algorithm, starting from the optimal policy for the case 
ew — eth (Lemma |2]), ranks the states in the set Stj in 
decreasing order of secondary access efficiency, and iteratively 
allocates the secondary access to the state with the highest 
efficiency, among the states where the SU is idle. The rationale 
of this step is that secondary access in the most efficient 
state yields the steepest increase of the SU throughput, per 
unit increase of the SU access rate or, equivalently, of the 
PU throughput loss and of the SU power expenditure. The 
optimality of Algorithm [T] is established in the following 
theorem. 

Theorem 1. Algorithm Q] returns an optimal policy for the 
optimization problem t[17\l . □ 

Proof: See App. |C] ■ 

V. Special Case: degenerate cognitive radio 

NETWORK SCENARIO 

We point out that Algorithm[T]detemiines the optimal policy 
for a generic set of system parameters. However, the resulting 
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Fig. 3. Degenerate cognitive radio network 



optimal policy does not always have a structure that is easily 
interpreted. In this section, we consider a special case of the 
general model discussed so far, a degenerate cognitive radio 
network, where the activity of the PU is unaffected by the 
transmissions of the SU, i.e., the channel gain between the 
SU transmitter and the PU receiver is zero. 

Consider the scenario depicted in Fig. |3] where PUrx is 
outside the transmission range of SUtx, whereas SUrx is 
inside the transmission range of both SUtx and SUrx. In this 
scenario, the interference produced by SU to PU is negligible. 
In contrast, the PU produces significant interference at the 
SU receiver The SU thus potentially benefits by employing 
the BIC and FIC mechanisms. We denote this scenario as 
a Degenerate cognitive radio network, and we model it by 
assuming that the SNR of the interfering link SUtx^PUrx is 
deterministically equal to zero, i.e., 'jsp — 0. From ([T]i, we 
then have qpp = g^p-* = qpp, i.e., the outage performance of 
the PU is unaffected by the activity of the SU, and the primary 
ARQ process is independent of the secondary access poUcy. 
We define 

A A TsK — TsU — Ps,buf^sU 



(20) 



From (|9|l, it follows that > 0, with equality if Rsu — RsK- 
Therefore, RsuAg is the marginal throughput gain accrued in 
the states where the PU message is known to SUrx, over the 
throughput accrued in the states where the PU message is 
unknown (instantaneous throughput Ts\j plus BIC throughput 
Ps,huiRsV, possibly recovered in a future ARQ retransmis- 
sion). The following lemma proves that, if the marginal 
throughput gain is "small", the secondary accesses in the 
high SU access rate regime in a degenerate cognitive radio 
network are allocated, in order, to the states in Sk (Lemma|2]l, 
then to the idle states {t, b, U) in <Su, giving priority to states 
with low b and t over states with high b and t, respectively. 
An illustrative example of the optimal policy for this scenario 
is given in Fig. ID 

Lemma 3. In the degenerate cognitive radio network scenario 

•.u (A) (I) -f 
With q'pp' = q'p,; = qpp, if 



A ,, < 



1 - Qp. 



(A) 



(A) 

qps 



(21) 



the sequence of policies (/i^"-*, . . . , /x^^ returned by Algo- 
rithm [T] is such that, Vi G N(0, N - 1), 

^(')(s)=l, VsGcSk, (22) 
^^^^\t,b,V)=!^l 6>6wS, ' V(t,6,U)G5u, (23) 

where {t) is non-increasing in t and non-decreasing in i, 
with 6(0) (i) = and ^(^-^^(i) = K^^^t), i.e., 

KUt) = b^^'-'Ht) >■■■> b^'Ht) > b^'-'\t) (24) 
> ... > =0. 

(1) > 5(0 (2) > . . . > M*) (t - 1) > 6^*) (i) > . . . > (D) , 

(25) 



where 



1 - Qpp {<li^^ - Qps^ Aoit + l) 



RsU 



'^Ipp (4^^ ~ 9ps ) Mit + i) 



(qi^^ - q?s) (1 - qpp{l - q^p])Ao(t + 1)) 



- 1 



and we have defined 



1 _ n-r + lJI){D-T+l) 
^ Hpp 'iPS 

1 qppqps 



1 - q^-'+^ 



1 Qpp 



(26) 

(27) 
(28) 



Proof: See App. |D] 



Remark 4. Interestingly, this is the same result derived in our 
work ||2l for D = 2. However, therein the result was shown 
to hold for general qpp'' > qpp (not necessarily a degenerate 
cognitive radio network), whereas Lemma |3] holds for general 
D but only for a degenerate cognitive radio network scenario. 

The lemma dictates that, in the degenerate cognitive radio 
network scenario, the SU should restrict its channel accesses 
to the states corresponding to a low primary ARQ index and 
small buffer occupancy at the SU receiver Alternatively, the 
larger the ARQ index or the buffer occupancy, the smaller the 
incentive to access the channel. By doing so, the SU maxi- 
mizes the buffer occupancy in the early HARQ retransmission 
attempts, and invests in the future BIC recovery. When the 
primary ARQ state t approaches the deadline D, the SU is 
incentivized to idle so as to help SUrx to decode the PU 
message, thus enabling the recovery of the failed SU trans- 
missions from the buffered received signals via BIC, before 
the ARQ deadline D is reached and the buffer is depleted. 



Moreover, when the buffer state b grows, since g^f^^ > qpj, the 
instantaneous reward accrued by staying idle ((1 — qp])bRsTj) 
approaches and, at some point, becomes larger than the reward 
accrued by transmitting (T^u + (1 — qp^'')bRs-[j), hence the 



qps 




Fig. 4. Illustrative example of the structure of the optimal secondary access 
policy for the degenerate cognitive radio network; the SU is active in the 
black states, idle in the white ones, and randomly accesses the channel in 
the gray state; the arrows indicate the possible state transitions (transitions to 
state (1, 0, U) are omitted). 

incentive to stay idle grows. On the other hand, if As is large, 
then the marginal throughput gain accrued in the states where 
the PU message is known to SUrx, over the throughput accrued 
in the states where the PU message is unknown, is large. The 
SU is thus incentivized to stay idle in the initial ARQ rounds, 
so as to help SUrx decode the PU message. Therefore, for 
large As, the optimal policy may not obey the structure of 
Lemma [3] 

As a final remark, note that, in the degenerate cognitive 
radio network scenario, the only limitation to the activity 
of the SU is the secondary power expenditure -Ps(/i), since 
the primary throughput is unaffected. In the special case 
P^^^' = Pg in ( fTSl ), neither the secondary power expenditure 
nor the primary throughput degradation limit the activity of 
the SU, hence the optimal policy solves the unconstrained 
maximization problem fi* — argmax^ ^^(/x), whose solution 
follows as a corollary of Lemma |3] 

Corollary 1. In the degenerate cognitive radio network sce- 
nario, the solution of the unconstrained optimization problem 
/X* = arg max^ Tg (/i) yields 

/i*(s) ==1, Vse^K, (29) 

M*(t,5,U) = { J ^<^-W ,V(t,5,U)e5,, (30) 

where 6max(i) is defined in ( |26] ). 
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PU 


Rp ~ 2.52 


4p - 0.38 


q^pP ~ 0.68 


SU, Rsv = argmaxfl^ Tsu {Rs,Rp) 


Rsu = 1-12 
g^'i ~ 0.61 
RsK ^ 1.91 


r,u ^ 0.59 
^ 0.74 
TsK ^ 1.10 


Ps,bui = 0.26 


SU, Rsv = RsK 


Rsv ~ 1.91 
^ 0.61 
RsK ^ 1.91 


Tsv ~ 0.40 
4^' ~ 0.88 
TsK 1.10 


Ps.buf = 0.37 



TABLE I 

PARAMETERS OF THE SU AND PU, FOR THE SNRS 7s = 5, 7p = 10, 



VI. Numerical Results 

We consider a scenario with Rayleigh fading channels, i.e., 
the SNR jx, X ^ {s,p, sp,ps}, is an exponential random 
variable with mean = Ix- We consider the following 

parameters, unless otherwise stated. The average SNRs are 
set to 7s — 7ps = 5, 7p = 10, 7sp = 2. The ARQ deadline is 
D = 5. RsK is chosen as Rsk — argmaxfl^ Tsk{Rs)- The PU 
rate Rp is chosen as the maximizer of the instantaneous PU 
throughput under an idle SU, i.e., Rp = argmaxii;Tp^'(i?). 
For the rate R^u, we evaluate the two cases i?su — ^sU 
and i?su = ^sK, where i?*^ = argmax^,^, Tsu(i?s, ^p)- 
The former maximizes the instantaneous throughput under 
interference from the PU, thus neglecting the buffering ca- 
pability at SUrx; therefore, the choice Rs\j = i?su reflects a 
pessimistic expectation of the ability of SUrx to decode the PU 
message and to enable BIC. As to the latter, from ^ we have 
i?su = RsK = argmax_R^ Tsu{Rs,Rp) +Ps,buf (^s, ^p)^sK, 
hence i?su — RsK maximizes the sum of the instantaneous 
throughput and the future throughput possibly recovered via 
BIC, thus reflecting an optimistic expectation of the ability 
of SUrx to decode the PU message, which enables BIC. The 
PU throughput loss constraint is set to epu = 0.2, and the 
constraint on the SU power is set to 'Pi"^'' = Ps (inactive). 
The resulting values of the system parameters are listed in 
Table I] 

We consider the following schemes: "FIC/BIC", which 
employs both FIC and BIC; the optimal "FIC/BIC" policy is 
derived using Algorithm [T| and Lemma |2l "FIC only", which 
does not employ the buffering mechanism fT\; "no FIC/BIC", 
which employs neither BIC nor FIC. In this case, the SU mes- 
sage is decoded by leveraging the PU codebook structure [24 1; 
however, possible knowledge of the PU message gained during 
the decoding operation is only used in the slot where the PU 
message is acquired, but is neglected in the past/future PU 
retransmissions. For "no FIC/BIC", the optimal policy consists 
in accessing the channel with a constant probability in all time- 
slots, independently of the underlying state, so as to attain 
the PU throughput loss constraint with equality. "PM known" 
refers to an ideal scenario where SUrx perfectly knows the 
current PU message in advance, and removes its interference; 
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P 0.3 

CO 

0.2 
0.1 




High Secondary 
Access Rate 


Low Secondary 
Access Rate 


^'^^ Regime 


Regime 












- 




^ ^ X ^ ^^^^ 




" -e-Bic/Fic, fl,u = fi;u 






-e-BIC/FIC, ii.u =-B«K 






-»-FIC only, R.u = i?*;; 






- X - FIC only, ii,u = R.K 






-^no BIC/FIC, J?,,u = «:„ 






0.9 1 1.1 


1.2 1.3 


1.4 1.5 



PU Tlirougliput, Tpifj.) 



Fig. 5. SU throughput vs PU throughput. 7s 
The other parameters are given in Table |l] 



■ "fps = 5, 7sp = 2, 7p = 10. 




7.,p/7p 



Fig. 6. SU thi'oughput vs SNR ratio 7sp/7p. PU tiu'oughput loss constraint 
CPU = 0.2. 7s = 7ps = 5, 7p = 10. i?su = .RsU- 



specifically, SUtx transmits with rate i?sK, thus accruing the 
throughput TsK at each secondary access; "PM known" thus 
yields an upper bound to the performance of any other policy 
considered. 

In Fig. |5] we plot the SU throughput versus the PU 
throughput, obtained by varying the SU access rate constraint 
ew in ([TtT i from to 1. As expected, the best performance 
is attained by "FIC/BIC", since the joint use of BIC and FIC 
enables IC at SUrx over the entire sequence of PU retrans- 
missions. "FIC only" incurs a throughput penalty (except in 
the low SU access rate regime > 1.37 where, from 

Lemma H "FIC/BIC" does not employ BIC), since the SU 
transmissions which undergo outage due to severe interference 
from the PU are simply dropped, "no FIC/BIC" incurs a 
further throughput loss, since possible knowledge about the 
PU message is not exploited to perform IC. Concerning the 
choice of the transmission rates, we note that the selection 
Rsv = i?*u outperforms Rsv = RsK for the scenario 
considered. Note that, with Rstj ~ Rtv '^^e SU accrues 
a larger instantaneous throughput (Tsu), but FIC and BIC 
are impaired, since both the buffering probability dS), Ps,hui, 
and the probability that SUrx does not successfully decode 

(A) 

the PU message, qps , diminish. Hence, in this case the 
instantaneous throughput maximization has a stronger impact 
on the performance than enabling FIC/BIC at SUrx. 

In Fig. |6l we plot the SU throughput versus the SNR ratio 
7sp/7p, where % = 5 and Rsv = RItj- Note that, for 
Ispllp < 0.5, the SU throughput increases. In fact, in this 
regime the activity of the SU causes little harm to the PU, and 
the constraint on the PU throughput loss is inactive. The SU 
thus maximizes its own throughput. As increases from 
to 0.57p, the activity of the SU induces more frequent primary 
ARQ retransmissions, hence there are more IC opportunities 
available and the SU throughput augments. On the other 
hand, as grows beyond 0.57p, the constraint on the PU 
throughput loss becomes active, secondary accesses become 
more and more harmful to the PU and take place more and 
more sparingly, hence the SU throughput degrades. 

In Fig. |7] we plot the SU throughput versus the SNR ratio 




0.5 1 1.5 2 2.5 3 3.5 4 

7p«/7« 



Fig. 7. SU throughput vs SNR ratio 7ps /7s . PU throughput loss constraint 
epu = 0.2. 7s = 5, 7sp = 2, 7p = 10. iJsU = R*stj- 

Ips/ls, where % = 5 and i?su = R*s\j, which is a function of 
7ps. We notice that, when 7^^ — 0, the upper bound is achieved 
with equality, since the SU operates under no interference from 
the PU. The upper bound is approached also for 7^3 ^ 
corresponding to a strong interference regime where, with high 
probability, SUrx can successfully decode the PU message, 
remove its interference from the received signal, and then 
attempt to decode the SU message. The worst performance is 
attained when 7ps c± 7^/2. In fact, the interference from the 
PU is neither weak enough to be simply treated as noise, nor 
strong enough to be successfully decoded and then removed. 

In Fig. [8] we plot the SU throughput versus the SU rate 
ratio Rsv/RsK, where Rsk — 1-91 is kept fixed. Clearly, "no 
FIC/BIC" attains the best performance for Rs\j = R^tj, which 
maximizes the throughput Tsu{RsTjiRp) achieved when nei- 
ther FIC nor BIC are used. On the other hand, the performance 
of "FIC/BIC" is maximized for a slightly larger value of 
Rsv- In fact, this value reflects the optimal trade-off between 
maximizing the throughput T^u (RsU — 0.59Rsk in Fig- H), 
maximizing the buffering probability, Ps.huf (RsU 1)> and 
minimizing the probability that SUrx does not successfully 
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0.45f 




Ratio of SU rates, fl,,,u/-RsK ARQ deadline, D 



Fig. 8. SU throughput vs SU rate ratio i?su/^sK- RsH — 1-91 is kept 
fixed. PU throughput loss constraint epu = 0.2. 7s = 5, 7sp = 2, 7p = 10, 
7ps = 5. 



Fig. 10. SU throughput vs ARQ deadline D. PU throughput loss constraint 
epu = 0.2. 73 = 7ps = 5, 7sp = 2, 7p = 10. RsU = RsV 




BulfGring prob. at SR, p^.buf 

PM decoding prob. at SR 
with active SU, l-Qp^' 

norm. SU Thi'oughput, r,u/r,K 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.£ 

Ratio of SU rates, R^u/Rt^K 



(A) 

Fig. 9. Probabilities Ps,buf, 1 — gps and normalized SU throughput T^u 
vs the SU rate ratio iJ^u/^sK- RsK — 1-91 is kept fixed. 7a = 7ps = 5, 
7sp = 2, 7p = 10. 



decode the PU message, g^^^ (Rs\j -J> 0). Finally, "FIC only" 
is optimized by R^tj ~ 0.52i?sK < ^su- ^^^^^ "FIC only" 
does not use BIC, this value reflects the optimal trade-off 

(A) 

between maximizing T^u and minimizing qps (i?sU ^ 0). 

In Fig. [TOl we plot the SU throughput versus the ARQ 
deadline D. We notice that, when D = 1, all the IC 
mechanisms considered attain the same performance as "no 
FIC/BIC". In fact, this is a degenerate scenario where the PU 
does not employ ARQ, hence no redundancy is introduced in 
the primary transmission process. Interestingly, by employing 
FIC or BIC, the performance improves as D increases. In 
fact, the larger D, the more the redundancy introduced by the 
primary ARQ process, hence the more the opportunities for 
FIC/BIC at SUi-x. 

VII. Conclusion 

In this work, we have investigated the idea of leveraging 
the redundancy introduced by the ARQ protocol implemented 
by a Primary User (PU) to perform Interference Cancellation 
(IC) at the receiver of a Secondary User (SU) pair: the SU 



receiver (SUrx), after decoding the PU message, exploits this 
knowledge to perform Forward IC (FIC) in the following ARQ 
retransmissions and Backward IC (BIC) in the previous ARQ 
retransmissions, corresponding to SU transmissions whose 
decoding failed due to severe interference from the PU. We 
have employed a stochastic optimization approach to optimize 
the SU access strategy which maximizes the average long-term 
SU throughput, under constraints on the average long-term PU 
throughput degradation and SU power expenditure. We have 
proved that the SU prioritizes its channel accesses in the states 
where SUrx knows the PU message, thus enabling FIC, and 
we have provided an algorithm to optimally allocate additional 
secondary access opportunities in the states where the PU 
message is unknown. Finally, we have shown numerically the 
throughput gain of the proposed schemes. 

Appendix A 

In this appendix, we compute Ts{^), Ws(m) and state 
properties of Ws{^j)- 

Definition 3. We define G^(t,fo, $), V^(i,6,$) and 
D^(t, 6, $) as the average throughput, the average number 
of secondary channel accesses and the average number of 
time-slots, respectively, accrued starting from state (<, 6, $) 
until the end of the primary ARQ cycle under policy /i {i.e., 
until the recurrent state (1,0, U) is reached). Starting from 

for G„ 



- !,&,$) = 0, V6,V$ e {U,K}|] where stands 
or (we write X G {G,V,D}), these are 
defined recursively as, for t £ N(l, £>), b e N(0, t - 1), 



^^(i,0,K) 



f l,fe,U|t,&,U)X^(t + l,6, U) 

f 1, + 1, \J\t, 6, U)X^(t + 1, & + 1, U) 

f l,0,K|t,6, U)X^(t + l,0,K), 



(31) 



+ 



X^(t + 1,0,K), 



where 6, $) is the cost/reward accrued in state (i, 6, $) 
and Pr^ (• I •) is the one-step transition probability, which can be 

' We introduce the fictitious state (D + 1, b, $) for notational convenience. 
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TABLE 11 

Transition probabilities. X g {A, 1} denotes the action of the SU: active (A) or idle (I) 



To 


(1,0, U) 


(t+1,6, U) 


(f+l,6+l,U) 


(t + l,0,K) 


From 


xe{A,i} 


A 


I 


A 


I 


X e {A,i} 


(t,6,U) 


1 {X) 
1 Qpp 


„(A). (A) s 
Hpp \Hps Ps,huf J 


Qpp Qps 


(A) 
Qpp Ps,hui 





Qpp (1 ~ 9ps ) 


(A&,U) 


1 











(i,0,K) 


1 (X) 








(X) 

Qpp 


(D,0,K) 


1 












derived with the help of Table Ullby taking the expectation with 
respect to the actions SU idle (I, with probability 1— b, $)) 
and SU active (A, with probability b, $)), yielding 

Pr^i + 1,6, U|t, 6, U) = M(t, 6, U)g(^) ) - p,,buf) 

+ {l-^^{t,b,V))ql,'Jql,i\ (32) 
Pr^(i + 1,6+1, U|i, b, U) = Ai(t, 6, U)g(^)p,,buf , (33) 
Pr^i + 1, 0, K\t, b, U) = ^l(t, b, U)g(^) (l - 

+ (l-A^(i,6,U))9(^)(l-g(i)). (34) 

Namely, if X = G (throughput), then a;^(t, 6, <&), $ S 
{U, K}, is the expected throughput accrued in state (t, 6, $), 
and is given by 



x^{t,b,\J) ^ fi{t,b,\J)Tsv 



+ 



flit, b, u)(i - 4^)) + (1 - ^Ji(t, 6, u))(i - gW) 



(35) 

bRsu 
(36) 



where the second term in (136) accounts for the successful 
recovery of the b SU messages from the buffered received 
signals via BIC, when the PU message is decoded by SUrx; if 
X = V (secondary access), then Xf^{t, b, $) is the SU access 
probability in state (i, 6, $), i.e., 



,{t,b,<i>)^n{t,b,<i>)^v,,{t,b,<i>) 



(37) 



finally, if X = D (time-slots), then 

= 1 (38) 

corresponding to one time-slot. Moreover, we define, for X G 
{G,V,D}, 

. dX' (s) 



x;.(s) 



d^(s) 



(39) 
□ 



The number of visits to state (1,0, U) up to time-slot n is 
a renewal process |25|. Each renewal interval (i.e., the ARQ 
sequence in which the PU attempts to deliver a specific packet) 
has average duration D^(1,0,U), over which the expected 
accrued SU throughput is G^(1,0,U), and the expected 
number of secondary channel accesses is V^(1,0,U). Then, 
the following lemma directly follows from the strong law of 
large numbers for renewal-reward processes [E5l . 



Lemma 4. The average long-term SU throughput and access 
rate are given by Ts{fi) = gf^fg W,{^l) = ^^}^y 
respectively. □ 

We have the following lemma. 

Lemma 5. We have 



d^(s) 



> 0, Vs e S, e U. 



(40) 



The inequality is strict if and only if state s is acces- 
sible from (1,0, U) under policy ^, i.e., 3 n > : 
Pr^"^ (s|(l, 0, U)) > 0. Moreover, for all s £ 5 we have 



v;(s)-d;.(s)vf,(a*)>o. 



(41) 

□ 



Proof: If state s is not accessible from state (1,0, U) 
under policy then the steady state distribution satisfies 
^^{s) = 0, hence VKs(/i) is unaffected by /i(s). Otherwise, 
from Lemma |4] we have that 



dV^(l,0,U) dD^(l,0,U) 



d/j(s 



dp(s) 



A^l{s) D^(1,0,U) 
cxV;(s)-D;(s)M^,(^), 



(42) 



where oc represents equality up to a positive multiplicative 
factor, and the right hand side holds since, VX G {V, D} and 

^ e S, ^-^^l^ = Pr(*) {t, b, $|1, 0, U) X;,(i, 6, 
If s G 5k, i.e., s = (<, 0, K), we have 

^j^|^ocV;(M,K)-D;(t,0,W(,) 

>V;(t,0,K)-D;,(i,0,K)^A^(t), (43) 
where, from (l3ll we have used the fact that D^(i,0, K) = 



(A) 



4p^)D^(i + 1,0,K) > and < 1. 



We now prove by induction that > 0,V t G N(1,T), 

so that dlQll and (HTJ follow for s G 5k. From dlB, for t < D, 
after algebraic manipulation we obtain 

A^{t) = 1 + _ gW)[v^(t + 1, 0, K) - D^(< + 1, 0, K)] 
= 1 - + Pr^(t + 2, 0, K|t + 1, 0, K)A^(t + 1). (44) 

Since A^{D) = 1 > 0, we obtain > by induction. 

If s G S\], i.e., s = (t, 6, U), we have 



dfi{t,b,\J) 



<^V'^{t,b,\J)-D'^{t,b,lJ)Wsifi). (45) 
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We prove that V'^{t,b,\J) - T)'^{t,b,V)WsiiJ,) > in two 
steps, so that ( |40] i and flTb follow for s G 5u. First, we prove 
that C^,{t,b) = T)'^{t,b,V) > 0. Then, since W,in) < 1, we 
obtain 



^V'{t,b,\J)~C^{t,b)W,{fi) 



.„(A)_ (I) 

^pp Hpp 



> Y'^{t, b, U) - n'^{t, b, U) ^ b). (46) 

Finally, we prove that 6) > 0. 

Proof of Cf,{t, b) > 0: from (EB, for t < D we have 

C,it,b) - - q^il - gm))D^(i + 1,0, K) 

+ (QipHqi^^ ~ Ps,buf) ~ g«g«)D^(t + 1, fo, U) 

+ 4pWufD^(i + l,6+l,U). (47) 

Using the recursions (ISTT i and rearranging the terms, we obtain 
the recursive expression 

C^{t, 6) =Pi>(t+2, 6+2, U|< +!,&+!, U)C,,(t + 1, & + 1) 
Pr^(t + 2, fe, U|t + 1, b, \J)C^{t + 1, 6) 

+ [(i-A*(t + i,o,K))gm(i-gm) 

+/i(t + l,0,K)4A)(i_g(A))] (g(A) _5(i))D^(t + 2,0,K). 

Since Cf,{D,b) = 0, V 6 £ N(0, £> - 1), it follows by 
induction on t that C^{b,t) > 0. 

Proof of B^{t,b) > 0: From (EB, for t < I? we obtain 
the following recursive expression for b), after algebraic 
manipulation, 

B^{t, b) = l- q^f) + Pi>(t + 2, 6, U|i + 1, 6, U)S^(t + 1, fe) 
+ Pr^(t + 2, 6 + 2, U|< + 1, + 1, \5)B^{t + 1,6+1) 
+ [(l-A*(t + l,0,K))gm(l-gW) 

+/i(t + l,0,K)4^)(i_q(A))] A^{t + l), (48) 

here is defined in ( |43l l. The result follows by induction, 

since S^Cf, 6) = 1 > and A^,{t + 1) > 0. ■ 

Appendix B 

In this appendix, we give a rigorous definition of secondary 
access efficiency, thus complementing Def. |2] Moreover, in 
Lemma |6] we derive it. We recall that Pr|,"' (s|so) is the n- 
step transition probability of the chain from Sq to s. 

Definition 4. Let /i G be a policy such that 3?i > : 

Pr^"^ (s|(l,0,U)) > 0, and ^„ (1 - v)^i + vjl, where 
V £ (0, 1], fi Cz U. We define the secondary access efficiency 
under policy /i in state s G 5 as 



(s) = lim 



d/i„(s) 



0+ dWsit^^) 
d/i„(s) 



Remark 5. Notice that the condition 3 



□ 



> 



Pr|j"^ (s|(l,0,U)) > guarantees that state s is accessible 
from state (1,0, U) under policy for v > 0. Under this 
> (Lemma |5] in App. |A]i, hence the 



fraction within the limit is well defined for v > and in the 
limit w 0+. One such policy is /i(s) = 0.5, Vs G 5. □ 

Using Lemma |4] and Def. [3] in App. |A] and Def. H] 77^ (s) 
can be derived according to the following lemma. 



Lemma 6. We have (s) = gffzg'g^^ ■ 



□ 



Remark 6. This is well defined, since V^(s) — D^(s)VKs(a*) > 
from Lemma |5] in App. |A] □ 

Appendix C 

Proof of Theorem Q} In the first part of the theorem, 
we prove that, by initializing Algorithm [T] with the idle policy 
^(0)^ ^(o)(-g-) = 0, Vs G S, and with the set of idle states 
s'^^^ = S, we obtain an optimal policy. In the second part of 
the proof, we prove the optimality of the specific initialization 
of Algorithm [T] for the high SU access rate regime. 

Let /i be a policy under which all states s G 5 are accessible 



from state (1,0, U), i.e., 3 n > : Pr^"^ (s|(l,0,U)) > 0. 
One such policy is /i(s) = 0.5, Vs G 5. Consider a modified 
Markov Decision Process, parameterized by u G (0, 1), 
obtained by applying the policy {l — v)^ + vjx\o the original 
system, where ji £ U. Since /i G U and v G (0, 1), it 
follows that (1 — v)n + vjl ^ U. We define Ts{n,v) = 
r,((l - v)fi + vfl) and Ws{^l,v) = VF,((1 - v)fi + u^), 
and we study the problem 

=argmax^g;^T,(^,u) s.t. Ws{ii,v) < ew, (49) 

where the parameter v is small enough to guarantee a feasible 
problem, i.e., 3 /i G W : Ws{fJ,,v) < ew- ( flTb is obtained 
in the limit w 0+. Notice that, V /i G t/, under policy 
(1 — + vp,, all the states s G 5 are accessible from 
state (1,0, U), and the Markov chain is irreducible. Hence, 
from Lemma |5] in App. lAl Ws(/i,w) is a strictly increasing 
function of /i(s), Vs G S. This is an important assumption in 
the following proof. 

Let T> C U he the set of all the deterministic policies, 
and g^, = {{Wsifi,v),Ts{p,v)) ,m e 2?}. With the help of 
Fig. HU for any fi & we have that (Ws{n, v) ,Ts{^i, v)) G 
coiw{Qi,), where conv(C/u) is the convex hull of the 
set Qij. In particular, for the optimal policy we have 
(VF,(Ai*("),i^),T,(^*(-),t;)) G bd(a,), where bd(g,) de- 
notes the boundary of conv{gv). 

Algorithm [T] determines the sequence of vertices of the 
polyline hd{Qi,) in the limit u — !> 0^ (bold line in Fig. [TTl l. 
For V > 0, starting from the leftmost vertex of hd{Qi,), 
achieved by the idle policy /^("^(s) = 0, Vs G 5 (this 
follows from the fact that Ws{iJ.,v) is a strictly increasing 
function of fJ,{s), hence it is minimized by the idle policy), 
the algorithm determines iteratively the next vertex of hd{Qv) 
as the maximizer of the slope 



(50) 



condition. 



d^i(s) 



= argmax - - 

Since ( fTTj i has one constraint, the optimal policy is 
randomized in one state f22l, and hence each segment on 
the boundary hd{Q^) between pairs (l^s(^(*\ f), Ts(^(*\ u)) 
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Fig. 1 1 . Geometric interpretation of problem | |491 



achievable with deterministic pohcies is attained by a pohcy 
that is randomized in only one state. It follows that /i'*) and 
differ in only one state. Moreover, in (l5Qt the max- 
imization is over fj, £ V such that > Ws{fj,^'^\v), 
i.e., since Ws{fJ., v) is a strictly increasing function of /i(s) and 
^(2+1) and /i'*' differ in only one position, is obtained 
from fi^^^f by allocating one more secondary access to a state 
which is idle under In dSOb . the maximization is thus 



over < fi^^^ 



s e S 



idle 



tion, in dSO]) maximizes 



|, and, after algebraic manipula- 



max 



max 



^(l~u)/j(*)+i;/i 



(s). 



Stage i of the algorithm is thus proved. If ?7(i_,;)^(i)+„/i(s) < 
0, we have (/iW + (5^, u) > Ws{ti^'\v) and 



If this condition holds 



V s G 5j^dic' ^'^y ^^^^ vertex of the polyline bd(^„) yields a 
decrease of the SU throughput and a larger SU access rate, 
hence a sub-optimal set of policies, and the algorithm stops. 

By construction, the algorithm returns a sequence of policies 
(/Lt^^^i e N(0, — 1)), characterized by strictly increasing 
values of the SU throughput and of the SU access rate. 
The optimal policy belongs to the polyline with vertices 
V„ = {(ir«(^«,i;),T,(M«,w)), j e N(0,iV - 1)}, denoted 
by pl(Vi,) in Fig. [TT] Then, ( [TtI i becomes equivalent to 
T/^^' = max Ts s.t. Wg < ewi whose solution is given 

in the last step of Algorithm [T] The result finally follows for 

V 0+. 

To conclude, we prove the initialization of Algorithm [T] 



for the high SU access rate. Let {p 



(0) 



,^(^-1)) and 



(s^°\ . . . , s^^"^') be the sequence of deterministic poli- 
cies and of states returned by Algorithm [T] obtained 
by initializing the algorithm as in the first part of the 
proof. Let Vq = {^eV : n(t, 0, 0) = V t e N(l, T)}, 
Vq = eVf) : ^(s) = 1, Vs £ 5k}, and A^o — max{i e 
{0, . . . ,iV - 1} : I^,(AiW) < eth}- We prove that ^(^o+i) e 
f>o, i.e., fl^^«+'^^s) = 1, Vs e Sk- From the definition of 
2?o ^nd the construction of the algorithm, it follows that, for 
i > No, /^^^H^) = Ij V s G Sk- Moreover, from Lemma |7] 



eth- Hence, for the high SU access rate e > 



eth, the optimal policy /i* obeys ^*{s) = l,Vs e 5k. Then, 
letting Ui = {fi ^ U : /i(s) = l,Vs G Sk}, the optimization 
problem ([TtT i can be restricted to the set of randomized policies 
fi E Ui C U when e > cth- Equivalently, secondary accesses 
taking place in Stj can be obtained by initializing the algorithm 



with ^(")(s) = 0, s G Sv, = 1, s G Sk, 5;^/, = Sij. 

The initialization of Algorithm [T] is thus proved. 

Proof of ^(^0+1) G Vq: We prove by induction that ^^'^ G 
Vq\Vq, \fi < No and ^(^o+i) g Vo- Assume that, for some 
i > 0, ^(^^ G Vo\Vo, Vj < i. From LemmaH it follows that 
No > i. This clearly holds for i — 0. We show that this implies 
that either G 2?o \ i'o, hence A^'o > i, thus proving the 

induction step, or G Vo, hence A^'o = i, thus proving 

the property. The result follows since A'o < 1 + l^j < oo (i.e., 
i = No is reached within a finite number of steps). 



From Lemma [8] 7y (i) (s) — T^k > 0,Vs G Sk S- 



(i) 
idle 



and ri^(i) {t, 0, U) < T^k, Vt G N(l, D), hence, from the main 
iteration stage of the algorithm it follows that G Vo- 

In particular, if G 2?o \ T^o, then No > i from Lemma 

|7] On the other hand, if G Vo, then, from Lemma |7] 

A^o = The property is thus proved. ■ 



Lemma 7. 

Ws{^I) < eth, V^i (^Vo\Vo and Ws{^l) - eth, V^* e Vo. 



□ 



Proof: Let /i G Vo- Since the states (t, b, U) with 6 > 
are not accessible from (1,0, U) under /i, the transmission 
probability ^(<,6,U), 5 > 0, does not affect Ws{n). Then, 
from Def. [T] we have Ws(m) — ^th- 

Let fi E V \ Vo- Letting 5^ = {s G 5k : fJ.{s) — 0}, we 
have that /i + X^sgs ^ o- Finally, since every s G 5^ is 
accessible from (1, 0, U) under fi, and 5^ is non-empty, from 
Lemma |5] in App. |A] and the previous case, it follows that 

Lemma 8. Let^ieU such that n{t, 0, U) = V < G N(l, D). 
Then, T]f,{t, 0, U) < T,k and 0, K) = T.k, Vt □ 

Proq/.- Let fieU such that ^(i, 0, U) = V t G N(l, D). 
It follows that the states {t, b, U) with 6 > are not accessible, 
hence their steady state probability satisfies 7r^(t, 6,U) = 
0, V V > 0. It is then straightforward to show, by 
using the recursion dlB, that Gp(t, 0,U) = TsKVp(t, 0, K), 
G^(t,0,K) = T,KV,,(t,0,K) and Tsi^l) = T.kVK.Cm)- 
Then, using these expressions, the recursion (l3ll and Lemma 
|6] we obtain ri^{t, 0,K) — Tsk and 

TsKV'^{t,0,\J)-G'^{t,0,V) 



(51) 



v;,(t,o,u)-D;,(t,o,u)M^.(M) 

We now prove that ?7^(t, 0,U) < Tsk, which proves the 
lemma. Equivalently, using Lemma |5] in App. |A] and ( |3TI) . 
we prove that 



T.kV' (t, 0, U) - G'Jt, 0, U) - {TsK - T, 



(52) 



„(A) 



Ps,buf [T,KV,,(t, 1, U) - G^(t, 1, U)] > 0. 
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Letting 

Mf,{t,b) ^b{TsK-Tsv) (53) 
+ <'ps,buf[r,KV^(t,fo,U)-G^(t,6,U)] >0, Vi,6> 1, 

(|52T i is equivalent to M^{t, 1) > 0. We now prove by induction 
that Mf^{t,b) > 0, V t, 6 > 1, yielding (|52]i as a special 
case when 6=1. For t = _D + 1 we have M^{D + l,b) = 
b{TsK-Tsv) > 0, since T^k > and b > 1. Now, let t < L> 
and assume M^{t + 1,6) > 0. Using dSTT l. after algebraic 
manipulation we obtain 

M^(t, b) = b{TsK - Tsu) + qi^'>PsMiKt, b, U)(r,K - T^u) 



bR 



sU 



l-/i(t,6,U)<?(^)-(l-M(t,6,U))<zm 

+ Pr^(t + 1, 6, \J\t, b, U)[M^(t + 1,6)- 6(r,K - T,u)] 

+ Pr^(t +1,6+1, U|t, 6, U)Af^(t + 1, 6) 

- Pr^(t +1,6 + 1, U|t, 6, U)(6 + 1)(T,K - T^u)- (54) 

Finally, since + 1,6) > by the induction hypothesis, 

using inequality (|9|l we obtain 



M^(t, 6) > PsMibRsV (l - qip^) 



+ PsmMv{1 - b, \J))q^'J{q^pP - (?«) > 0, 
which proves the induction step. The lemma is proved. 



(55) 



Appendix D 

Proof of Lemma \3}i Let T) <Z U the set of all the 
deterministic (non-randomized) policies. Let 

i>= {/i G D : Ai(i,6,U) = l,Vi,6 < 6(t); 

H{t,b,\]) = 0,Vi,6 > 6(<); /i(s) = l,s G 5k; 
3 6(-) : 6(t+ 1) < b{t) Vt}. 

By inspection, we have that the sequences of policies ( |22] ) are 
such that e V, Vi e N{0,N - 1). Therefore, the first 
part of the lemma states that e V, Vi G N(0,iV - 1). 
We prove this property by induction. Namely, we show that 
fii'') ^(^+1) e v. Then, since e V (initiaHzation 

of Algorithm [TJ it follows that G V, Vi. Let e V, 
i.e., fj,^'^^ is given by (l22b for some b^^\t) non-increasing in i. 
The set of idle states is then given by 



o(i) - 

'^idlG = 



|(<,6,U) g5u :<GN(l,i:'),6>6^^)(<)|. (56) 



We then prove that, under the hypotheses of the lemma, 
77^(.) (i, 6, U) > ■q^^^) {t, 6 + 1, U) and rj^i^) {t, b, U) > r]f,{t + 
1,6,U), V(i,6,U) G Sl^^. It follows that the SU access 
efficiency is maximized by the state in the idle set sl^^^ with 
the lowest value of the primary ARQ state t, among the states 
with the same buffer occupancy 6, and with the fewest number 
of buffered received signals 6, among the states with the same 
primary ARQ state t. Therefore, in the main iteration stage 
of the algorithm, the SU access efficiency is maximized by 
s*^*' ~ argmax „(i) ?7„(i) (s), where s^'^ = (i, 6, U) is such 

that T >t, (5 >b,\f {t, 13, U) G 5;^)^. By inspection, we have 
that ^(*+^) = + i5s(i) G V, hence the induction step is 
proved. 



We thus need to prove the induction step, i.e., letting G 
V, we show that 

{t, b, U) > {t, b + 1, U), V(i, 6, U) G Sl2e. 

?7^(„ {t, 6, U) > 77^(t + 1, 6, U), y{t, b, U) G Sl2,- (57) 

To this end, note that, in the degenerate cognitive radio 
network scenario, the primary ARQ process is not affected 
by the SU access scheme, hence, using the notation in 
App. |A] D'^(i, (t, 6, U) = 0. By the definition of SU access 
efficiency (|6]l, we thus obtain 



G'^i.At^b,\J) 



(58) 



where, using (O, (I32H34| |, (|36T l and 

g;,., {t, 6, U) = r,u + (<z« - g^^)) 6i?,u (59) 

+ ?pp(?^^' - Ps,hnf - q^p])G^x., (i + 1, 6, U) 
+ gppPs, bufG^(.) (< + 1, 6 + 1, U) 
+ qpp{q^^] -qi^^)G^,.,{t + l,0,K), 
V;,,, {t, 6, U) = 1 + qppiql^^ - ps,bnf ~ g^l^)V^(o (t + 1, 6, U) 
+ gppPs, bufV^xo (i + 1, 6 + 1, U) 

+ qppiq^p] - q^ps^)y^i^> + 1, o, k). (60) 

Using the fact that (r, /3, U) = 0, 'iT>t,l3> 6, it can be 
proved that 



V^(.,(t,/3,U) -Ai(T)-Ao(r), 
G^(., (r, /3, U) = (1 - q^})PRsvAo{T) 



(61) 
(62) 

+ TsK{Ai{r)-A^{T)), 
V^w(t,0,K) = Ai(t), (63) 
G^(.,(t,0,K) =T,kAi(t), (64) 

where Ao(-) and Ai{-) are defined in (l27T l and ( |28] |. re- 
spectively. The expressions (16 1 1164b can be easily verified by 
induction, starting from t = D + 1 backward. In fact, for 
r = £» + 1, we have Aq{D + 1) = Ai{D + 1) = 0, 
hence we obtain V^(.) {D + l,(3,\])= G^(,) (I? + 1, ^, U) = 
Y^(i^{D + 1,0, K) = G^(.,(L> + 1,0, K) = 0, which is 
consistent with Def. [3j The induction step can be proved 
by inspection, using the recursive expression (l3ll and the 
fact that ^(r, /3,U) =0, Vr > > 6. Substituting the 
expressions (I61H64I I in ( |59] l and ( l60l l. we obtain 

G' it, b, U) = r,u + qppPsMfil - q^J])RsvAoit + 1) 



[qm-qW)bR,^ 



qpp 



ps 

V;(„(i,6,U) 



ps 

l-qpp{l-q^])Ao{t + l) 
+ 1), 

l-qpp{qi^'^ -q^;})Ao{t + l). 



4t))T,KAo(t + l 



(65) 
(66) 



Proof of v^^., {t, b, 0) > 7;^(„ (i, 6 + 1, 0) 

By substituting ( |65] ) and ( l66b in ( fSST i, and noticing 
that V'(,)(t,6,U) = V'(,)(t,6 + 1,U) from (|66ll and 
V^(,)(i,\u) > (from Lemma |5] with D^(s) = 0), the 
condition i]^(i){t,b,0) > r]^(i){t,b + 1,0) is equivalent to 
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G^(,,(t,6,U) > G^(.)(i,&+ 1,U), which is readily verified 
from (|65l l. since 

G;,.,(<,6,U)-G;(.,(t,6+l,U) (67) 

^<Zpp(l-gm)Ao(t + l) 



1 - 



(A) 



Qpp 



1 



QppQps 



R,v > 0, 



where the first inequaUty follows from the fact that AQ{t+l) < 
^—(ij, the second from qp] < ql^\ 

Proof of 7^^^,) {t, b, 0) >v,,{t + l, b, 0) 

Since V|^(,, (t, 6, U) > 0, the condition r]^(,){t,b,0) > 
r]^{t + 1, &, 0) is equivalent to 

g;„ {t, b, u) (v;<„ {t + 1, b, u) - v;<., (i, b, u)) 
> v;,., (i, b, u) (g;<„ + 1, 6, u) - g;„ (t, 6, u)) . 



Using ( 165b and ( I66] l, after algebraic manipulation we obtain 
the equivalent condition 



„(i) 

^ps 



1; 



i^n A. > 0, 



(A) ' ,i) Ps buf, which is an hypothesis of the 



where we have used the fact that T^k = ^sRsV + TsU + 
Ps,hufRsV- Since we require this condition to hold V6 > 
and the left hand expression is minimized by 6 = 0, the condi- 
tion ( |69] l should be satisfied for 6 = 0, yielding the equivalent 

l-g<*' 

condition < 
lemma. 

It is thus proved that the sequence of policies returned by 
Algorithm [T] has the structure defined by (l22t . where fo^*^ (t) 
satisfies the inequality ( l24l i. Moreover, the inequality (l25l l 
holds since, by the algorithm construction, is obtained 

from /x''' by "activating" one additional state from the set of 



(0 

idle- 



idle states 5;^^ 

The second part of the lemma states that = 
bmax{t), where 6max(0 is given by (l26T l. This is a con- 
sequence of the fact that Algorithm [T] stops if the SU 
access efficiency becomes non-positive, i.e., rj^(i) (s) < 0, 
Vs G S[^Yc- From ( |58] l, this condition is equivalent to 
G^(,)(i,6,U) < 0, V(t,6,U) G 5i^^ic- By using ^ and by 
solving G^(i) (t, 6, U) < with respect to b, the result follows. 
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